15 research outputs found
Counterfactual Multi-Agent Policy Gradients
Cooperative multi-agent systems can be naturally used to model many real
world problems, such as network packet routing and the coordination of
autonomous vehicles. There is a great need for new reinforcement learning
methods that can efficiently learn decentralised policies for such systems. To
this end, we propose a new multi-agent actor-critic method called
counterfactual multi-agent (COMA) policy gradients. COMA uses a centralised
critic to estimate the Q-function and decentralised actors to optimise the
agents' policies. In addition, to address the challenges of multi-agent credit
assignment, it uses a counterfactual baseline that marginalises out a single
agent's action, while keeping the other agents' actions fixed. COMA also uses a
critic representation that allows the counterfactual baseline to be computed
efficiently in a single forward pass. We evaluate COMA in the testbed of
StarCraft unit micromanagement, using a decentralised variant with significant
partial observability. COMA significantly improves average performance over
other multi-agent actor-critic methods in this setting, and the best performing
agents are competitive with state-of-the-art centralised controllers that get
access to the full state
Value Propagation Networks
We present Value Propagation (VProp), a set of parameter-efficient
differentiable planning modules built on Value Iteration which can successfully
be trained using reinforcement learning to solve unseen tasks, has the
capability to generalize to larger map sizes, and can learn to navigate in
dynamic environments. We show that the modules enable learning to plan when the
environment also includes stochastic elements, providing a cost-efficient
learning system to build low-level size-invariant planners for a variety of
interactive navigation problems. We evaluate on static and dynamic
configurations of MazeBase grid-worlds, with randomly generated environments of
several different sizes, and on a StarCraft navigation scenario, with more
complex dynamics, and pixels as input.Comment: Updated to match ICLR 2019 OpenReview's versio
Counterfactual Reasoning about Intent for Interactive Navigation in Dynamic Environments
Many modern robotics applications require robots to function autonomously in
dynamic environments including other decision making agents, such as people or
other robots. This calls for fast and scalable interactive motion planning.
This requires models that take into consideration the other agent's intended
actions in one's own planning. We present a real-time motion planning framework
that brings together a few key components including intention inference by
reasoning counterfactually about potential motion of the other agents as they
work towards different goals. By using a light-weight motion model, we achieve
efficient iterative planning for fluid motion when avoiding pedestrians, in
parallel with goal inference for longer range movement prediction. This
inference framework is coupled with a novel distributed visual tracking method
that provides reliable and robust models for the current belief-state of the
monitored environment. This combined approach represents a computationally
efficient alternative to previously studied policy learning methods that often
require significant offline training or calibration and do not yet scale to
densely populated environments. We validate this framework with experiments
involving multi-robot and human-robot navigation. We further validate the
tracker component separately on much larger scale unconstrained pedestrian data
sets
Simulation-Based Inference for Global Health Decisions
The COVID-19 pandemic has highlighted the importance of in-silico
epidemiological modelling in predicting the dynamics of infectious diseases to
inform health policy and decision makers about suitable prevention and
containment strategies. Work in this setting involves solving challenging
inference and control problems in individual-based models of ever increasing
complexity. Here we discuss recent breakthroughs in machine learning,
specifically in simulation-based inference, and explore its potential as a
novel venue for model calibration to support the design and evaluation of
public health interventions. To further stimulate research, we are developing
software interfaces that turn two cornerstone COVID-19 and malaria epidemiology
models COVID-sim, (https://github.com/mrc-ide/covid-sim/) and OpenMalaria
(https://github.com/SwissTPH/openmalaria) into probabilistic programs, enabling
efficient interpretable Bayesian inference within those simulators
Lessons from reinforcement learning for biological representations of space
Neuroscientists postulate 3D representations in the brain in a variety of different coordinate frames (e.g. 'head-centred', 'hand-centred' and 'world-based'). Recent advances in reinforcement learning demonstrate a quite different approach that may provide a more promising model for biological representations underlying spatial perception and navigation. In this paper, we focus on reinforcement learning methods that reward an agent for arriving at a target image without any attempt to build up a 3D 'map'. We test the ability of this type of representation to support geometrically consistent spatial tasks such as interpolating between learned locations using decoding of feature vectors. We introduce a hand-crafted representation that has, by design, a high degree of geometric consistency and demonstrate that, in this case, information about the persistence of features as the camera translates (e.g. distant features persist) can improve performance on the geometric tasks. These examples avoid Cartesian (in this case, 2D) representations of space. Non-Cartesian, learned representations provide an important stimulus in neuroscience to the search for alternatives to a 'cognitive map'